notebook.community

Edit and run

Breakdown of network to make sense of components.

It was interesting to see the manually constructed loss function

multi input network

shared weights

Need to clean up the notebook a bit



In [1]:

    
'''Train a Siamese MLP on pairs of digits from the MNIST dataset.
It follows Hadsell-et-al.'06 [1] by computing the Euclidean distance on the
output of the shared network and by optimizing the contrastive loss (see paper
for mode details).
[1] "Dimensionality Reduction by Learning an Invariant Mapping"
    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
Gets to 99.5% test accuracy after 20 epochs.
3 seconds per epoch on a Titan X GPU
'''









    Out[1]:





'Train a Siamese MLP on pairs of digits from the MNIST dataset.\nIt follows Hadsell-et-al.\'06 [1] by computing the Euclidean distance on the\noutput of the shared network and by optimizing the contrastive loss (see paper\nfor mode details).\n[1] "Dimensionality Reduction by Learning an Invariant Mapping"\n    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf\nGets to 99.5% test accuracy after 20 epochs.\n3 seconds per epoch on a Titan X GPU\n'



In [2]:

    
import numpy as np
import random



In [3]:

    
# Keras imports
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Input, Lambda
from keras.optimizers import RMSprop
from keras import backend as K









    



Using TensorFlow backend.



In [4]:

    
def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))



In [5]:

    
def eucl_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0], 1)



In [6]:

    
def contrastive_loss(y_true, y_pred):
    '''Contrastive loss from Hadsell-et-al.'06
    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
    '''
    margin = 1
    return K.mean(y_true * K.square(y_pred) +
                  (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))



In [7]:

    
def create_pairs(x, digit_indices):
    '''Positive and negative pair creation.
    Alternates between positive and negative pairs.
    '''
    pairs = []
    labels = []
    n = min([len(digit_indices[d]) for d in range(10)]) - 1
    for d in range(10):
        for i in range(n):
            z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
            pairs += [[x[z1], x[z2]]]
            inc = random.randrange(1, 10)
            dn = (d + inc) % 10
            z1, z2 = digit_indices[d][i], digit_indices[dn][i]
            pairs += [[x[z1], x[z2]]]
            labels += [1, 0]
    return np.array(pairs), np.array(labels)



In [8]:

    
def create_base_network(input_dim):
    '''Base network to be shared (eq. to feature extraction).
    '''
    seq = Sequential()
    seq.add(Dense(128, input_shape=(input_dim,), activation='relu'))
    seq.add(Dropout(0.1))
    seq.add(Dense(128, activation='relu'))
    seq.add(Dropout(0.1))
    seq.add(Dense(128, activation='relu'))
    return seq



In [9]:

    
def compute_accuracy(predictions, labels):
    '''Compute classification accuracy with a fixed threshold on distances.
    '''
    return labels[predictions.ravel() < 0.5].mean()



In [10]:

    
# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()



In [11]:

    
x_train.shape, y_train.shape









    Out[11]:





((60000, 28, 28), (60000,))



In [12]:

    
x_test.shape, y_test.shape









    Out[12]:





((10000, 28, 28), (10000,))



In [13]:

    
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

So basically here we are just flattening the images dimensions



In [14]:

    
# normalize
x_train /= 255
x_test /= 255
# flatten images
input_dim = 784
epochs = 20



In [ ]:



In [ ]:



In [ ]:



In [15]:

    
# create training+test positive and negative pairs
digit_indices = [np.where(y_train == i)[0] for i in range(10)]
tr_pairs, tr_y = create_pairs(x_train, digit_indices)

digit_indices = [np.where(y_test == i)[0] for i in range(10)]
te_pairs, te_y = create_pairs(x_test, digit_indices)



In [67]:

    
tr_pairs.shape









    Out[67]:





(108400, 2, 784)

so this dimensions are basically:

108400 is the number of items in the list

2 each entry in the list has 2 parts (which are the pairs)

784 is the dimension of the flatten image

scratch that. There is no list anymore since the function calls for np.array() before returning



In [66]:

    
tr_pairs[:,0].shape









    Out[66]:





(108400, 784)



In [68]:

    
tr_pairs[:,1].shape









    Out[68]:





(108400, 784)



In [81]:

    
tr_y.shape









    Out[81]:





(108400,)



In [17]:

    
len(digit_indices)









    Out[17]:





10



In [23]:

    
np.where(y_test == 2)









    Out[23]:





(array([   1,   35,   38, ..., 9980, 9985, 9995]),)

np.where returns a list, so you need the [0] to get a array only



In [24]:

    
np.where(y_test == 2)[0]









    Out[24]:





array([   1,   35,   38, ..., 9980, 9985, 9995])

so the code above creates an array for each of the numbers, each array has the indices for each of the numbers

create_pairs function

def create_pairs(x, digit_indices):
    '''Positive and negative pair creation.
    Alternates between positive and negative pairs.
    '''
    pairs = []
    labels = []
    n = min([len(digit_indices[d]) for d in range(10)]) - 1
    for d in range(10):
        for i in range(n):
            z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
            pairs += [[x[z1], x[z2]]]
            inc = random.randrange(1, 10)
            dn = (d + inc) % 10
            z1, z2 = digit_indices[d][i], digit_indices[dn][i]
            pairs += [[x[z1], x[z2]]]
            labels += [1, 0]
    return np.array(pairs), np.array(labels)



In [40]:

    
digit_indices[0][2], digit_indices[0][2+1]









    Out[40]:





(13, 25)



In [41]:

    
z1, z2 = digit_indices[0][2], digit_indices[0][2+1]

so we are getting the indices of two intances of the same number



In [43]:

    
x_test[z1][:10]









    Out[43]:





array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)



In [44]:

    
x_test[z2][:10]









    Out[44]:





array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)

then we are concatenating them



In [51]:

    
pairs += [[x_test[z1], x_test[z2]]]



In [52]:

    
len(pairs)









    Out[52]:





3

so we keep building the pairs



In [53]:

    
inc = random.randrange(1, 10)



In [54]:

    
inc









    Out[54]:





2



In [55]:

    
d = 4



In [56]:

    
dn = (d + inc) % 10



In [57]:

    
dn









    Out[57]:





6

here we are building a pair with a true and false instance thats why the inc on dn



In [ ]:

    
# this adds the negative pair
# z1, z2 = digit_indices[d][i], digit_indices[dn][i]
# pairs += [[x[z1], x[z2]]]

the label is 1 for the first example and zero for the fake that we just made



In [ ]:

    
# labels += [1, 0]



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [26]:

    
pairs = []
labels = []



In [27]:

    
n = min([len(digit_indices[d]) for d in range(10)]) - 1
# how many indices are the per number
# get the min of all the indices



In [28]:

    
n









    Out[28]:





891



In [29]:

    
for d in range(10):
        print(len(digit_indices[d]))



In [30]:

    
# then it substract 1 - I assume its because the range is non-inclusive



In [35]:

    
n = 10
ran = 5



In [70]:

    
l = []



In [75]:

    
l += [1, 0]



In [76]:

    
l









    Out[76]:





[1, 0, 1, 0, 1, 0]



In [37]:

    
for d in range(ran):
    for i in range(n):
        z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
        print(z1.shape, z2.shape)
        pairs += [[x_train[z1], x_train[z2]]]
        print(pairs)
        inc = random.randrange(1, 10)
        dn = (d + inc) % 10
        z1, z2 = digit_indices[d][i], digit_indices[dn][i]
        pairs += [[x_train[z1], x_train[z2]]]
        labels += [1, 0]
np.array(pairs), np.array(labels)









    



() ()






    



IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.






    



---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-37-5a45634537dc> in <module>()
      4         print(z1.shape, z2.shape)
      5         pairs += [[x_train[z1], x_train[z2]]]
----> 6         print(pairs)
      7         inc = random.randrange(1, 10)
      8         dn = (d + inc) % 10

/Users/jaimealmeida/anaconda/envs/dl/lib/python3.6/site-packages/numpy/core/numeric.py in array_repr(arr, max_line_width, precision, suppress_small)
   1896     if arr.size > 0 or arr.shape == (0,):
   1897         lst = array2string(arr, max_line_width, precision, suppress_small,
-> 1898                            ', ', "array(")
   1899     else:  # show zero-length shape unless it is (0,)
   1900         lst = "[], shape=%s" % (repr(arr.shape),)

/Users/jaimealmeida/anaconda/envs/dl/lib/python3.6/site-packages/numpy/core/arrayprint.py in array2string(a, max_line_width, precision, suppress_small, separator, prefix, style, formatter)
    461     else:
    462         lst = _array2string(a, max_line_width, precision, suppress_small,
--> 463                             separator, prefix, formatter=formatter)
    464     return lst
    465 

/Users/jaimealmeida/anaconda/envs/dl/lib/python3.6/site-packages/numpy/core/arrayprint.py in _array2string(a, max_line_width, precision, suppress_small, separator, prefix, formatter)
    334     lst = _formatArray(a, format_function, len(a.shape), max_line_width,
    335                        next_line_prefix, separator,
--> 336                        _summaryEdgeItems, summary_insert)[:-1]
    337     return lst
    338 

/Users/jaimealmeida/anaconda/envs/dl/lib/python3.6/site-packages/numpy/core/arrayprint.py in _formatArray(a, format_function, rank, max_line_len, next_line_prefix, separator, edge_items, summary_insert)
    505 
    506         for i in range(trailing_items, 1, -1):
--> 507             word = format_function(a[-i]) + separator
    508             s, line = _extendLine(s, line, word, max_line_len, next_line_prefix)
    509 

/Users/jaimealmeida/anaconda/envs/dl/lib/python3.6/site-packages/numpy/core/arrayprint.py in __call__(self, x, strip_zeros)
    615                 else:
    616                     return self.special_fmt % (_nan_str,)
--> 617             elif isinf(x):
    618                 if x > 0:
    619                     if self.sign:

KeyboardInterrupt:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [59]:

    
# network definition
base_network = create_base_network(input_dim)

input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))



In [61]:

    
base_network.summary()









    



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               16512     
=================================================================
Total params: 133,504.0
Trainable params: 133,504.0
Non-trainable params: 0.0
_________________________________________________________________



In [62]:

    
# because we re-use the same instance `base_network`,
# the weights of the network will be shared across the two branches
processed_a = base_network(input_a)
processed_b = base_network(input_b)

distance = Lambda(euclidean_distance,
                  output_shape=eucl_dist_output_shape)([processed_a, processed_b])

model = Model([input_a, input_b], distance)



In [63]:

    
model.summary()









    



____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 784)           0                                            
____________________________________________________________________________________________________
input_2 (InputLayer)             (None, 784)           0                                            
____________________________________________________________________________________________________
sequential_1 (Sequential)        (None, 128)           133504                                       
____________________________________________________________________________________________________
lambda_1 (Lambda)                (None, 1)             0                                            
====================================================================================================
Total params: 133,504.0
Trainable params: 133,504.0
Non-trainable params: 0.0
____________________________________________________________________________________________________



In [ ]:

    
# loss is specified manually
# optimizer is RMSprop



In [77]:

    
tset = [tr_pairs[:, 0], tr_pairs[:, 1]]



In [79]:

    
len(tset)









    Out[79]:





2



In [80]:

    
len(tr_y)









    Out[80]:





108400



In [28]:

    
# train
rms = RMSprop()
model.compile(loss=contrastive_loss, optimizer=rms)
model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
          batch_size=128,
          epochs=epochs,
          validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y))









    



Train on 108400 samples, validate on 17820 samples
Epoch 1/20
108400/108400 [==============================] - 5s - loss: 0.0960 - val_loss: 0.0432
Epoch 2/20
108400/108400 [==============================] - 5s - loss: 0.0406 - val_loss: 0.0294
Epoch 3/20
108400/108400 [==============================] - 5s - loss: 0.0277 - val_loss: 0.0248
Epoch 4/20
108400/108400 [==============================] - 5s - loss: 0.0221 - val_loss: 0.0241
Epoch 5/20
108400/108400 [==============================] - 5s - loss: 0.0190 - val_loss: 0.0235
Epoch 6/20
108400/108400 [==============================] - 5s - loss: 0.0165 - val_loss: 0.0219
Epoch 7/20
108400/108400 [==============================] - 5s - loss: 0.0148 - val_loss: 0.0219
Epoch 8/20
108400/108400 [==============================] - 5s - loss: 0.0136 - val_loss: 0.0221
Epoch 9/20
108400/108400 [==============================] - 5s - loss: 0.0125 - val_loss: 0.0221
Epoch 10/20
108400/108400 [==============================] - 5s - loss: 0.0119 - val_loss: 0.0233
Epoch 11/20
108400/108400 [==============================] - 5s - loss: 0.0113 - val_loss: 0.0228
Epoch 12/20
108400/108400 [==============================] - 5s - loss: 0.0108 - val_loss: 0.0221
Epoch 13/20
108400/108400 [==============================] - 5s - loss: 0.0105 - val_loss: 0.0228
Epoch 14/20
108400/108400 [==============================] - 5s - loss: 0.0101 - val_loss: 0.0228
Epoch 15/20
108400/108400 [==============================] - 5s - loss: 0.0099 - val_loss: 0.0226
Epoch 16/20
108400/108400 [==============================] - 5s - loss: 0.0091 - val_loss: 0.0220
Epoch 17/20
108400/108400 [==============================] - 5s - loss: 0.0091 - val_loss: 0.0241
Epoch 18/20
108400/108400 [==============================] - 5s - loss: 0.0086 - val_loss: 0.0240
Epoch 19/20
108400/108400 [==============================] - 5s - loss: 0.0085 - val_loss: 0.0239
Epoch 20/20
108400/108400 [==============================] - 5s - loss: 0.0083 - val_loss: 0.0230






    Out[28]:





<keras.callbacks.History at 0x1118ef940>



In [14]:

    
# compute final accuracy on training and test sets
pred = model.predict([tr_pairs[:, 0], tr_pairs[:, 1]])
tr_acc = compute_accuracy(pred, tr_y)
pred = model.predict([te_pairs[:, 0], te_pairs[:, 1]])
te_acc = compute_accuracy(pred, te_y)

print('* Accuracy on training set: %0.2f%%' % (100 * tr_acc))
print('* Accuracy on test set: %0.2f%%' % (100 * te_acc))









    



* Accuracy on training set: 99.97%
* Accuracy on test set: 99.62%



In [ ]: